(2.4.3) IBM G5

Timothy J. Slegel, et al. IBM's S/390 G5 Microprocessor, IEEE Micro, Mar/Apr 1999, pp. 12-23. IEEE Xplore link

S/390 G5 processor : 1999 
     took 10 years to reach the performance of the last bipolar system for CMOS
     L2 ran at half speed (500Mhz core clock at 1.7V)
     full custom 

     not superscalar
     ESA/390 : older ISA :      numerous, relatively commonly used, instr that require tens/hundred of clk cycles to execute. 
                     not load-store
                     too complex : difficult to implenet : decimal data instr, addressing modes, multiple address spaces, precise interrupts VM emulation and 2 different floating point archs. 
     G5 uses millicode 

     L1 cache : buffer control element : cache itself, cache directory, TLB, adress translate logic
     I unit : instr fetch, decode, addr gen, queue of instr waiting
     E unit : exec, local working copy of registers
     R unit : recovery unit : checkpointed copy of the inter microarch state timing facility

L1 cache unified instr, opearnd, millicode data and is store through. 2 way interleaved

>> 256 bytes : ideal cache line size : compromise between fetch time of the last byte and perf improvement

Absolute address history table : predict preTLBed address 
TLB : dynamic address translation and access register translation 
ART : access register translation ART lookaside buffer (ALB) 

2 way set associative BTB 2048 entries
Decimal unit for financial data 

Exception handler : 
     I unit tries to find out which instructions cause an exception 
     does something called single instruction mode : all previous instructions are cleared and this particular instr is sent thru
     normal speed deduction : gross, pessimistic check
     single instr mode : actual precise check

     E unit has local copies
     R unit does have RAM type master copy (architectural state) of registers
     on commit, R unit is written (with ECC)
     R unit used for recovery (checkpoint)

Millicode : executing a complex instr is like hardwired subroutine call
     > uses completely indep set of register 
     > also service functions : hardware error logs, scrubbing memory for correctable errors, supporting operator console functions and controlling low-level I/Ooper
Virtual machine emulation
     hw support : 3 complete copies of all architected control register, 3 copies of timing facility registers. (host mode, first level guest and second-level)

2 types of floating point

symmetric multiprocessor : memory : uniform access time. 

lots of relaiabiltiy features
     error check, parity, state checking, local duplication of control logic and so on. 20-30% error correction logic!
     full duplicate I unit and E unit. if the signals dont match, hardware error recovery is invoked.
     in a array of processors : delete mechanism removes a processor completely.
     memory : redundant word-lines to automatically replace defective sections (even in customer site)
     Processor availability facility : scans out the latches from a check-stopped processor, stores it in a special area, os then resumes this on another processor
     PAF + concurrent sparing => completely transparent to customer, appatakar.